Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: fix tuple IS NULL logic #48299

Merged
merged 1 commit into from
May 15, 2020
Merged

Conversation

mgartner
Copy link
Collaborator

@mgartner mgartner commented May 1, 2020

Previously, we treated all cases of x IS NULL as x IS NOT DISTINCT FROM NULL, and all cases of x IS NOT NULL as x IS DISTINCT FROM NULL. However, these transformations are not equivalent when x is a
tuple.

If all elements of x are NULL, then x IS NULL should evaluate to
true, but x IS DISTINCT FROM NULL should evaluate to false. If one
element of x is NULL and one is not null, then x IS NOT NULL
should evaluate to false, but x IS DISTINCT FROM NULL should evaluate
to true. Therefore, they are not equivalent.

Below is a table of the correct semantics for tuple expressions.

Tuple IS NOT DISTINCT FROM NULL IS NULL IS DISTINCT FROM NULL IS NOT NULL
(1, 1) false false true true
(1, NULL) false false true false
(NULL, NULL) false true true false

Notice that IS NOT DISTINCT FROM NULL is always the inverse of
IS DISTINCT FROM NULL. However, IS NULL and IS NOT NULL are not
inverses given the tuple (1, NULL).

This commit introduces new tree expressions for IS NULL and IS NOT NULL. These operators have evaluation logic that is different from IS NOT DISTINCT FROM NULL and IS DISTINCT FROM NULL, respectively. While
an expression such as x IS NOT DISTINCT FROM NULL is parsed as a
tree.ComparisonExpr with a tree.IsNotDisinctFrom operator,
execbuiler will output the simpler tree.IsNullExpr when the two
expressions are equivalent - when x is not a tuple.

This commit also introduces new optimizer expression types,
IsTupleNull and IsTupleNotNull. Normalization rules have been added
for folding these expressions into boolean values when possible.

Fixes #46675
Informs #46908
Informs #12022

Release note (bug fix): Fixes incorrect logic for IS NULL and IS NOT NULL operators with tuples, correctly differentiating them from IS NOT DISTINCT FROM NULL and IS DISTINCT FROM NULL, respectively.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@mgartner
Copy link
Collaborator Author

mgartner commented May 1, 2020

@RaduBerinde Here's my first stab at the tuple IS NULL bug. You don't need to look too thoroughly, but it would be great if you could let me know if I'm on the right track. Note I've only made the changes for IS NULL not IS NOT NULL—I want to make sure I'm on the right track first.

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is definitely on the right track!

I really like the "unary" tree operators

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner)


pkg/sql/opt/norm/fold_constants.go, line 73 at r1 (raw file):

	tup := input.(*memo.TupleExpr)
	for _, e := range tup.Elems {
		if e.Op() != opt.NullOp {

What if the input is a variable or some other non-const expression? It could still be Null after evaluation


pkg/sql/opt/norm/fold_constants.go, line 82 at r1 (raw file):

// IsNonNullTuple returns true if the input tuple has one non-null constant
// value.
func (c *CustomFuncs) IsNonNullTuple(input opt.ScalarExpr) bool {

These methods should document that they are meant to have yes/maybe semantics (ie allow false positives).

I think it's fine to fold (1,x) IS NULL to false. But not (x,y)..

@mgartner mgartner force-pushed the is-null-tuples branch 12 times, most recently from 32f0eaf to bacea2d Compare May 8, 2020 19:11
Copy link
Collaborator Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner)


pkg/sql/logictest/testdata/logic_test/alter_table, line 933 at r2 (raw file):

INSERT INTO t VALUES (1), (NULL)

statement error validation of NOT NULL constraint failed: validation of CHECK "a IS DISTINCT FROM NULL" failed

@RaduBerinde The deeper I get into this, the more I think this is not a great solution. For example, I don't love how this error message changes, but I don't see a way around it.

Copy link
Collaborator Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner)


pkg/sql/logictest/testdata/logic_test/alter_table, line 933 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

@RaduBerinde The deeper I get into this, the more I think this is not a great solution. For example, I don't love how this error message changes, but I don't see a way around it.

Maybe we can convert an Is/IsNot back into a tree.IsNullExpr/tree.IsNotNullExpr when the right side is a null in execbuilder? I'm not sure if that's safe, though.

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner)


pkg/sql/logictest/testdata/logic_test/alter_table, line 933 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Maybe we can convert an Is/IsNot back into a tree.IsNullExpr/tree.IsNotNullExpr when the right side is a null in execbuilder? I'm not sure if that's safe, though.

Yeah I think we should do that (but only if the type of the left side is not of tuple type).

@mgartner mgartner force-pushed the is-null-tuples branch 11 times, most recently from 8d2271c to f55dcbd Compare May 10, 2020 23:27
@cockroachdb cockroachdb deleted a comment from blathers-crl bot May 11, 2020
@mgartner mgartner marked this pull request as ready for review May 11, 2020 00:58
@mgartner mgartner requested a review from a team as a code owner May 11, 2020 00:58
@mgartner mgartner requested a review from yuzefovich May 11, 2020 00:58
Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

nit: I usually don't link the issue numbers in the commit message and do link them only in the PR description. This way those issues don't get a lot of spam whenever the commits are amended.

Reviewed 2 of 54 files at r2, 15 of 51 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @yuzefovich)


pkg/sql/colexec/execplan.go, line 1113 at r3 (raw file):

	}
	err = ppr.planPostProcessSpec(ctx, flowCtx, post, streamingMemAccount)
	if err != nil && processorConstructor == nil {

Why do you want to add this? I think it's a good idea, but I would like to know what prompted you to add this. In our unit tests we "expect" that the wrapping might fail (with a panic), and the unit tests are expected to catch those panics.


pkg/sql/colexec/execplan.go, line 1485 at r3 (raw file):

			ctx, evalCtx, t.TypedInnerExpr(), columnTypes, input, acc,
		)
		_, negate := expr.(*tree.IsNotNullExpr)

nit: this will always be false.


pkg/sql/colexec/execplan.go, line 1525 at r3 (raw file):

					return nil, resultIdx, ct, internalMemUsed, err
				}
				// IS NULL is replaced with IS NOT DISTINCT FROM NULL, so we want to

nit: this comment is no longer valid and should be removed.


pkg/sql/colexec/execplan.go, line 1625 at r3 (raw file):

	case *tree.BinaryExpr:
		return planProjectionExpr(ctx, evalCtx, t.Operator, t.ResolvedType(), t.TypedLeft(), t.TypedRight(), columnTypes, input, acc)
	case *tree.IsNullExpr:

nit: I'd probably extract a helper function here to remove duplication.


pkg/sql/colexec/execplan.go, line 1680 at r3 (raw file):

		}
		typs = appendOneType(typs, t.ResolvedType())
		resultIdx = len(typs) - 1

super nit: you could move this line before appendOneType line and remove -1 part.

Copy link
Collaborator Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR Yahor! And for the suggestion to keep issue links out of commits.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @yuzefovich)


pkg/sql/colexec/execplan.go, line 1113 at r3 (raw file):

Previously, yuzefovich wrote…

Why do you want to add this? I think it's a good idea, but I would like to know what prompted you to add this. In our unit tests we "expect" that the wrapping might fail (with a panic), and the unit tests are expected to catch those panics.

As I started adding support for IsNullExpr and IsNotNull to colexec, the test cases in is_null_ops_test.go would fail with a nil pointer exception here:

proc, err := processorConstructor(

It took me quite some time to track down exactly what the problem was—was just the IsNullExpr logic missing, or were the tests also broken, or something else?

With this change, the "real" err is propagated in the output of the failing test, which looks like unable to columnarize filter expression "@1 IS NULL": unhandled selection expression type: *tree.IsNullExpr.

I don't feel too strongly, but I'd hate for someone to spend more time than needed chasing down this nil pointer exception when a new expression is added.


pkg/sql/colexec/execplan.go, line 1485 at r3 (raw file):

Previously, yuzefovich wrote…

nit: this will always be false.

Oops, good catch. Must have been some copy-pasta 🍝


pkg/sql/colexec/execplan.go, line 1525 at r3 (raw file):

Previously, yuzefovich wrote…

nit: this comment is no longer valid and should be removed.

I've updated it instead. I think it's worth keeping since I think it was attempting to explain why we negate in the case of IS DISTINCT FROM, instead of IS NOT DISTINCT FROM, since that's a bit confusing. What do you think?


pkg/sql/colexec/execplan.go, line 1625 at r3 (raw file):

Previously, yuzefovich wrote…

nit: I'd probably extract a helper function here to remove duplication.

Done.


pkg/sql/colexec/execplan.go, line 1680 at r3 (raw file):

Previously, yuzefovich wrote…

super nit: you could move this line before appendOneType line and remove -1 part.

Happy to do so, but can you explain the reason? I made this change here and throughout the file because it was easier for me to reason about resultIdx being the index of the last element of the newly expanded typs. But I don't have the context of the rest of the colexec code.

Would you like me to revert all the places I've made a similar change in this file?


pkg/sql/opt/norm/fold_constants.go, line 73 at r1 (raw file):

Previously, RaduBerinde wrote…

What if the input is a variable or some other non-const expression? It could still be Null after evaluation

Correct. That is why we return false, since we cannot guarantee that all elements are NULL.


pkg/sql/opt/norm/fold_constants.go, line 82 at r1 (raw file):

Previously, RaduBerinde wrote…

These methods should document that they are meant to have yes/maybe semantics (ie allow false positives).

I think it's fine to fold (1,x) IS NULL to false. But not (x,y)..

Done.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

colexec changes LGTM.

Reviewed 1 of 1 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @RaduBerinde)


pkg/sql/colexec/execplan.go, line 1113 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

As I started adding support for IsNullExpr and IsNotNull to colexec, the test cases in is_null_ops_test.go would fail with a nil pointer exception here:

proc, err := processorConstructor(

It took me quite some time to track down exactly what the problem was—was just the IsNullExpr logic missing, or were the tests also broken, or something else?

With this change, the "real" err is propagated in the output of the failing test, which looks like unable to columnarize filter expression "@1 IS NULL": unhandled selection expression type: *tree.IsNullExpr.

I don't feel too strongly, but I'd hate for someone to spend more time than needed chasing down this nil pointer exception when a new expression is added.

Got it, I agree it's a good idea to add this. Could you please leave this TODO here for me?
TODO(yuzefovich): update unit tests to remove panic-catcher when fallback to rowexec is not allowed.


pkg/sql/colexec/execplan.go, line 1525 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

I've updated it instead. I think it's worth keeping since I think it was attempting to explain why we negate in the case of IS DISTINCT FROM, instead of IS NOT DISTINCT FROM, since that's a bit confusing. What do you think?

Nice, good point.


pkg/sql/colexec/execplan.go, line 1680 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Happy to do so, but can you explain the reason? I made this change here and throughout the file because it was easier for me to reason about resultIdx being the index of the last element of the newly expanded typs. But I don't have the context of the rest of the colexec code.

Would you like me to revert all the places I've made a similar change in this file?

I think adding appendOneType helper is very good idea, my comment was to have

resultIdx = len(typs) - 1
typs = appendOneType(typs, t.ResolvedType())

and not the other way around.

For some reason, it reads more "naturally" to me, but it's no big deal. It's up to you whether you wanna do this line switch throughout this file.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @RaduBerinde)


pkg/sql/colexec/execplan.go, line 1680 at r3 (raw file):

Previously, yuzefovich wrote…

I think adding appendOneType helper is very good idea, my comment was to have

resultIdx = len(typs) - 1
typs = appendOneType(typs, t.ResolvedType())

and not the other way around.

For some reason, it reads more "naturally" to me, but it's no big deal. It's up to you whether you wanna do this line switch throughout this file.

I guess the reason for my such preference is that in this file we tend to have the following pattern

result index assignment
operator creation
type schema update

for example

	outputIdx := len(columnTypes)
	op, err = GetCastOperator(colmem.NewAllocator(ctx, acc), input, inputIdx, outputIdx, fromType, toType)
	typs = appendOneType(columnTypes, toType)

We don't follow the pattern religiously though, so it's not a problem if we deviate from it.

Copy link
Collaborator Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @yuzefovich)


pkg/sql/colexec/execplan.go, line 1113 at r3 (raw file):

Previously, yuzefovich wrote…

Got it, I agree it's a good idea to add this. Could you please leave this TODO here for me?
TODO(yuzefovich): update unit tests to remove panic-catcher when fallback to rowexec is not allowed.

Done.


pkg/sql/colexec/execplan.go, line 1680 at r3 (raw file):

Previously, yuzefovich wrote…

I guess the reason for my such preference is that in this file we tend to have the following pattern

result index assignment
operator creation
type schema update

for example

	outputIdx := len(columnTypes)
	op, err = GetCastOperator(colmem.NewAllocator(ctx, acc), input, inputIdx, outputIdx, fromType, toType)
	typs = appendOneType(columnTypes, toType)

We don't follow the pattern religiously though, so it's not a problem if we deviate from it.

Thanks, this context was helpful. I've made the changes to follow this pattern.

Copy link
Collaborator Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @yuzefovich)

a discussion (no related file):
@RaduBerinde I'm just now realizing that it appears that Postgres will normalize (a, b) IS NULL to ((a IS NULL) AND (b IS NULL)). I'm wondering if it would simplify this change if we did something similar.

I think we'd still need to parse the IS NULL into a tree.IsNullExpr to differentiate from a IsNotDistinctFrom comparison expr. But if we added normalization rules that removed the tuple completely, then execbuilder could avoid ever creating tree.IsNullExpr (it could panic if it ever saw a opt.IsTupleNullOp), and maybe we could remove a lot of the evaluation-related code in this PR (eval.go, colexec, etc.).

I'm curious what you think.


Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r5.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde)

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: This looks like it was a lot of work!

Reviewed 3 of 54 files at r2, 8 of 51 files at r3, 11 of 11 files at r6.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @RaduBerinde)

a discussion (no related file):

Previously, mgartner (Marcus Gartner) wrote…

@RaduBerinde I'm just now realizing that it appears that Postgres will normalize (a, b) IS NULL to ((a IS NULL) AND (b IS NULL)). I'm wondering if it would simplify this change if we did something similar.

I think we'd still need to parse the IS NULL into a tree.IsNullExpr to differentiate from a IsNotDistinctFrom comparison expr. But if we added normalization rules that removed the tuple completely, then execbuilder could avoid ever creating tree.IsNullExpr (it could panic if it ever saw a opt.IsTupleNullOp), and maybe we could remove a lot of the evaluation-related code in this PR (eval.go, colexec, etc.).

I'm curious what you think.

Removing the need for new execution code sounds like a good idea to me, but would be interested to hear @RaduBerinde's opinion.



pkg/sql/opt/norm/fold_constants.go, line 69 at r6 (raw file):

}

// HasOneNullElement returns true if the input tuple has one constant, null

one -> at least one


pkg/sql/opt/norm/fold_constants.go, line 73 at r6 (raw file):

// For example, given the tuple (1, x), it will return false because x is not
// guaranteed to be null.
func (c *CustomFuncs) HasOneNullElement(input opt.ScalarExpr) bool {

I'd change this to HasNullElement, since it can have more than one


pkg/sql/opt/norm/fold_constants.go, line 97 at r6 (raw file):

}

// HasOneNonNullElement returns true if the input tuple has one constant,

one -> at least one


pkg/sql/opt/norm/fold_constants.go, line 101 at r6 (raw file):

// be non-null. For example, given the tuple (NULL, x), it will return false
// because x is not guaranteed to be non-null.
func (c *CustomFuncs) HasOneNonNullElement(input opt.ScalarExpr) bool {

I'd change this to HasNonNullElement for the same reason as above


pkg/sql/opt/norm/fold_constants.go, line 104 at r6 (raw file):

	tup := input.(*memo.TupleExpr)
	for _, e := range tup.Elems {
		if e.Op() != opt.NullOp && (opt.IsConstValueOp(e) || e.Op() == opt.TupleOp || e.Op() == opt.ArrayOp) {

Does it matter whether the tuple/array is constant (or has at least one non-null constant)?

edit: I saw that you have tests showing that it doesn't matter, so might be worth giving that example here.


pkg/sql/opt/norm/testdata/rules/comp, line 411 at r6 (raw file):


norm expect=FoldNonNullTupleIsTupleNull
SELECT ((NULL, NULL), NULL) IS NULL AS r

So this is the same behavior as Postgres? Weird...


pkg/sql/opt/ops/scalar.opt, line 428 at r6 (raw file):


# Is maps to the IS NOT DISTINCT FROM operator which is equivalent to IS for
# non-tuples.

Might be worth adding a pointer to the tuple-specific operators above


pkg/sql/opt/ops/scalar.opt, line 436 at r6 (raw file):


# IsNot is the inverse of Is. It maps to the IS DISTINCT FROM operator which is
# equivalent to IS NOT for non-tuples.

ditto


pkg/sql/sem/tree/testdata/eval/is, line 362 at r6 (raw file):

false

# Tuple with no NULLS.

These tests make my brain hurt 😆

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @mgartner)

a discussion (no related file):

Previously, rytaft (Rebecca Taft) wrote…

Removing the need for new execution code sounds like a good idea to me, but would be interested to hear @RaduBerinde's opinion.

^We talked offline - such a rule would only apply for static tuples, not those that are the result of a more complex scalar expression (e.g. inside a case or from a subquery).

Fantastic job! :lgtm_strong: Thanks for working through some tedious details!

[nit] I'd mention in the commit message that we execbuild to the simpler IS (NOT) NULL variants when we know they are equivalent.



pkg/sql/logictest/testdata/logic_test/check_constraints, line 334 at r6 (raw file):


statement ok
CREATE TABLE t46675isnotnull (k int, a int, CHECK ((k, a) IS NOT NULL))

Nice test!


pkg/sql/opt/norm/fold_constants.go, line 73 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Correct. That is why we return false, since we cannot guarantee that all elements are NULL.

👍 Great descriptions!


pkg/sql/opt/ops/scalar.opt, line 306 at r6 (raw file):


# IsTupleNull is the boolean expression with a tuple input that evaluates to
# true if all elements in the tuple are null.

if the tuple is null or all the elements in the tuple are null.


pkg/sql/opt/ops/scalar.opt, line 313 at r6 (raw file):


# IsTupleNotNull is the boolean expression with a tuple input that evaluates to
# true if all elements in the tuple are not null.

if the tuple is not null and all the elements are not null.


pkg/sql/sem/tree/expr.go, line 267 at r6 (raw file):

// IsNullExpr represents an IS NULL expression.
type IsNullExpr struct {

[nit] Mention that this is equivalent to IS NOT DISTINCT FROM NULL, except for tuples.


pkg/sql/sem/tree/expr.go, line 295 at r6 (raw file):

// IsNotNullExpr represents an IS NOT NULL expression.
type IsNotNullExpr struct {

[nit] Mention that this is equivalent to IS DISTINCT FROM NULL, except for tuples.


pkg/sql/sem/tree/expr.go, line 482 at r6 (raw file):

func (node *ComparisonExpr) Format(ctx *FmtCtx) {
	opStr := node.Operator.String()
	if node.Operator == IsDistinctFrom && (node.Right == DBoolTrue || node.Right == DBoolFalse) {

[nit] Add a comment here explaining that IS / IS NOT are equivalent when the RHS is true/false and we prefer the easier to read variant. (and copy it to the corresponding code in pretty.go)

Previously, we treated all cases of `x IS NULL` as `x IS NOT DISTINCT
FROM NULL`, and all cases of `x IS NOT NULL` as `x IS DISTINCT FROM
NULL`. However, these transformations are not equivalent when `x` is a
tuple.

If all elements of `x` are `NULL`, then `x IS NULL` should evaluate to
true, but `x IS DISTINCT FROM NULL` should evaluate to false. If one
element of `x` is `NULL` and one is not null, then `x IS NOT NULL`
should evaluate to false, but `x IS DISTINCT FROM NULL` should evaluate
to true. Therefore, they are not equivalent.

Below is a table of the correct semantics for tuple expressions.

| Tuple        | IS NOT DISTINCT FROM NULL | IS NULL   | IS DISTINCT FROM NULL | IS NOT NULL |
| ------------ | ------------------------- | --------- | --------------------- | ----------- |
| (1, 1)       | false                     | false     | true                  | true        |
| (1, NULL)    | false                     | **false** | true                  | **false**   |
| (NULL, NULL) | false                     | true      | true                  | false       |

Notice that `IS NOT DISTINCT FROM NULL` is always the inverse of
`IS DISTINCT FROM NULL`. However, `IS NULL` and `IS NOT NULL` are not
inverses given the tuple `(1, NULL)`.

This commit introduces new tree expressions for `IS NULL` and `IS NOT
NULL`. These operators have evaluation logic that is different from `IS
NOT DISTINCT FROM NULL` and `IS DISTINCT FROM NULL`, respectively. While
an expression such as `x IS NOT DISTINCT FROM NULL` is parsed as a
`tree.ComparisonExpr` with a `tree.IsNotDisinctFrom` operator,
execbuiler will output the simpler `tree.IsNullExpr` when the two
expressions are equivalent - when x is not a tuple.

This commit also introduces new optimizer expression types,
`IsTupleNull` and `IsTupleNotNull`. Normalization rules have been added
for folding these expressions into boolean values when possible.

Release note (bug fix): Fixes incorrect logic for `IS NULL` and `IS NOT
NULL` operators with tuples, correctly differentiating them from `IS NOT
DISTINCT FROM NULL` and `IS DISTINCT FROM NULL`, respectively.
Copy link
Collaborator Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde, @rytaft, and @yuzefovich)

a discussion (no related file):

Previously, RaduBerinde wrote…

^We talked offline - such a rule would only apply for static tuples, not those that are the result of a more complex scalar expression (e.g. inside a case or from a subquery).

Fantastic job! :lgtm_strong: Thanks for working through some tedious details!

[nit] I'd mention in the commit message that we execbuild to the simpler IS (NOT) NULL variants when we know they are equivalent.

Thanks for the reviews!



pkg/sql/opt/norm/fold_constants.go, line 69 at r6 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

one -> at least one

Done.


pkg/sql/opt/norm/fold_constants.go, line 73 at r6 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

I'd change this to HasNullElement, since it can have more than one

Done.


pkg/sql/opt/norm/fold_constants.go, line 97 at r6 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

one -> at least one

Done.


pkg/sql/opt/norm/fold_constants.go, line 101 at r6 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

I'd change this to HasNonNullElement for the same reason as above

Done.


pkg/sql/opt/norm/fold_constants.go, line 104 at r6 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Does it matter whether the tuple/array is constant (or has at least one non-null constant)?

edit: I saw that you have tests showing that it doesn't matter, so might be worth giving that example here.

Correct. (k, (NULL, NULL)) IS NULL can be folded to false, because the IS NULL only cares after the first-level elements, not nested ones. An inner tuple will make this evaluate to false, regardless of its contents.

I've added some more commentary to try to make this more clear.


pkg/sql/opt/norm/testdata/rules/comp, line 411 at r6 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

So this is the same behavior as Postgres? Weird...

Ya it's confusing. A tuple filled with NULLs is null, but a tuple filled with tuples filled with NULLs is not null.


pkg/sql/opt/ops/scalar.opt, line 306 at r6 (raw file):

Previously, RaduBerinde wrote…

if the tuple is null or all the elements in the tuple are null.

Done.


pkg/sql/opt/ops/scalar.opt, line 313 at r6 (raw file):

Previously, RaduBerinde wrote…

if the tuple is not null and all the elements are not null.

Done.


pkg/sql/opt/ops/scalar.opt, line 428 at r6 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Might be worth adding a pointer to the tuple-specific operators above

Done.


pkg/sql/opt/ops/scalar.opt, line 436 at r6 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

ditto

Done.


pkg/sql/sem/tree/expr.go, line 267 at r6 (raw file):

Previously, RaduBerinde wrote…

[nit] Mention that this is equivalent to IS NOT DISTINCT FROM NULL, except for tuples.

Done.


pkg/sql/sem/tree/expr.go, line 295 at r6 (raw file):

Previously, RaduBerinde wrote…

[nit] Mention that this is equivalent to IS DISTINCT FROM NULL, except for tuples.

Done.


pkg/sql/sem/tree/expr.go, line 482 at r6 (raw file):

Previously, RaduBerinde wrote…

[nit] Add a comment here explaining that IS / IS NOT are equivalent when the RHS is true/false and we prefer the easier to read variant. (and copy it to the corresponding code in pretty.go)

Done.


pkg/sql/sem/tree/testdata/eval/is, line 362 at r6 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

These tests make my brain hurt 😆

Haha, tell me about it...

@mgartner
Copy link
Collaborator Author

bors r+

@craig
Copy link
Contributor

craig bot commented May 15, 2020

Build succeeded

@craig craig bot merged commit 08107a4 into cockroachdb:master May 15, 2020
@mgartner mgartner deleted the is-null-tuples branch May 15, 2020 16:52
mgartner added a commit to mgartner/cockroach that referenced this pull request May 19, 2020
Logic for `IS NULL` and `IS NOT NULL` with tuple operands was recently
fixed in cockroachdb#48299. This commit adds tests for empty tuples as operands
that were forgotten in that commit. This makes the correct evaluation
logic of such an expression explicit and will help prevent regressions.

Release note: None
craig bot pushed a commit that referenced this pull request May 19, 2020
49242: sql: add tests for IS (NOT) NULL with empty tuples r=mgartner a=mgartner

Logic for `IS NULL` and `IS NOT NULL` with tuple operands was recently
fixed in #48299. This commit adds tests for empty tuples as operands
that were forgotten in that commit. This makes the correct evaluation
logic of such an expression explicit and will help prevent regressions.

Release note: None

49274: authors: add John Sheaffer to authors file r=sheaffej a=sheaffej

Release note: None

49275: Add alexl@ to AUTHORS file. r=lunevalex a=lunevalex

Release note: None


Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: John Sheaffer <[email protected]>
Co-authored-by: Alex Lunev <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: wrong implementation of NULL predicate for row value expressions
5 participants